Goto

Collaborating Authors

 molecular weight


All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM Generations

Wanner, Miriam, Azzopardi, Leif, Thomas, Paul, Dan, Soham, Van Durme, Benjamin, Craswell, Nick

arXiv.org Artificial Intelligence

Existing methods for evaluating the factuality of large language model (LLM) responses treat all claims as equally important. This results in misleading evaluations when vital information is missing or incorrect as it receives the same weight as peripheral details, raising the question: how can we reliably detect such differences when there are errors in key information? Current approaches that measure factuality tend to be insensitive to omitted or false key information. To investigate this lack of sensitivity, we construct VITALERRORS, a benchmark of 6,733 queries with minimally altered LLM responses designed to omit or falsify key information. Using this dataset, we demonstrate the insensitivities of existing evaluation metrics to key information errors. To address this gap, we introduce VITAL, a set of metrics that provide greater sensitivity in measuring the factuality of responses by incorporating the relevance and importance of claims with respect to the query. Our analysis demonstrates that VITAL metrics more reliably detect errors in key information than previous methods. Our dataset, metrics, and analysis provide a foundation for more accurate and robust assessment of LLM factuality.


Exploring Modularity of Agentic Systems for Drug Discovery

van Weesep, Laura, Genheden, Samuel, Engkvist, Ola, Sjölund, Jens

arXiv.org Artificial Intelligence

Large-language models (LLMs) and agentic systems present exciting opportunities to accelerate drug discovery. In this study, we examine the modularity of LLM-based agentic systems for drug discovery, i.e., whether parts of the system such as the LLM and type of agent are interchangeable, a topic that has received limited attention in drug discovery. We compare the performance of different LLMs and the effectiveness of tool-calling agents versus code-generating agents. Our case study, comparing performance in orchestrating tools for chemistry and drug discovery using an LLM-as-a-judge score, shows that Claude-3.5-Sonnet, Claude-3.7-Sonnet and GPT-4o outperform alternative language models such as Llama-3.1-8B, Llama-3.1-70B, GPT-3.5-Turbo, and Nova-Micro. Although we confirm that code-generating agents outperform the tool-calling ones on average, we show that this is highly question- and model-dependent. Furthermore, the impact of replacing system prompts is dependent on the question and model, underscoring that even in this particular domain one cannot just replace components of the system without re-engineering. Our study highlights the necessity of further research into the modularity of agentic systems to enable the development of reliable and modular solutions for real-world problems.


xChemAgents: Agentic AI for Explainable Quantum Chemistry

Polat, Can, Tuncel, Mehmet, Kurban, Mustafa, Serpedin, Erchin, Kurban, Hasan

arXiv.org Artificial Intelligence

Recent progress in multimodal graph neural networks has demonstrated that augmenting atomic XYZ geometries with textual chemical descriptors can enhance predictive accuracy across a range of electronic and thermodynamic properties. However, naively appending large sets of heterogeneous descriptors often degrades performance on tasks sensitive to molecular shape or symmetry, and undermines interpretability. xChemAgents proposes a cooperative agent framework that injects physics-aware reasoning into multimodal property prediction. xChemAgents comprises two language-model-based agents: a Selector, which adaptively identifies a sparse, weighted subset of descriptors relevant to each target, and provides a natural language rationale; and a Validator, which enforces physical constraints such as unit consistency and scaling laws through iterative dialogue. On standard benchmark datasets, xChemAgents achieves up to a 22% reduction in mean absolute error over the state-of-the-art baselines, while producing faithful, human-interpretable explanations. Experiment results highlight the potential of cooperative, self-verifying agents to enhance both accuracy and transparency in foundation-model-driven materials science. The implementation and accompanying dataset are available at https://github.com/KurbanIntelligenceLab/xChemAgents.


Interpretable machine-learning for predicting molecular weight of PLA based on artificial bee colony optimization algorithm and adaptive neurofuzzy inference system

Masoumi, Amir Pouya, Creedon, Leo, Ghosh, Ramen, Munir, Nimra, McMorrow, Ross, McAfee, Marion

arXiv.org Artificial Intelligence

This article discusses the integration of the Artificial Bee Colony (ABC) algorithm with two supervised learning methods, namely Artificial Neural Networks (ANNs) and Adaptive Network-based Fuzzy Inference System (ANFIS), for feature selection from Near-Infrared (NIR) spectra for predicting the molecular weight of medical-grade Polylactic Acid (PLA). During extrusion processing of PLA, in-line NIR spectra were captured along with extrusion process and machine setting data. With a dataset comprising 63 observations and 512 input features, appropriate machine learning tools are essential for interpreting data and selecting features to improve prediction accuracy. Initially, the ABC optimization algorithm is coupled with ANN/ANFIS to forecast PLA molecular weight. The objective functions of the ABC algorithm are to minimize the root mean square error (RMSE) between experimental and predicted PLA molecular weights while also minimizing the number of input features. Results indicate that employing ABC-ANFIS yields the lowest RMSE of 282 Da and identifies four significant parameters (NIR wavenumbers 6158 cm-1, 6310 cm-1, 6349 cm-1, and melt temperature) for prediction. These findings demonstrate the effectiveness of using the ABC algorithm with ANFIS for selecting a minimal set of features to predict PLA molecular weight with high accuracy during processing


Many-Shot In-Context Learning for Molecular Inverse Design

Moayedpour, Saeed, Corrochano-Navarro, Alejandro, Sahneh, Faryad, Noroozizadeh, Shahriar, Koetter, Alexander, Vymetal, Jiri, Kogler-Anele, Lorenzo, Mas, Pablo, Jangjou, Yasser, Li, Sizhen, Bailey, Michael, Bianciotto, Marc, Matter, Hans, Grebner, Christoph, Hessler, Gerhard, Bar-Joseph, Ziv, Jager, Sven

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated great performance in few-shot In-Context Learning (ICL) for a variety of generative and discriminative chemical design tasks. The newly expanded context windows of LLMs can further improve ICL capabilities for molecular inverse design and lead optimization. To take full advantage of these capabilities we developed a new semi-supervised learning method that overcomes the lack of experimental data available for many-shot ICL. Our approach involves iterative inclusion of LLM generated molecules with high predicted performance, along with experimental data. We further integrated our method in a multi-modal LLM which allows for the interactive modification of generated molecular structures using text instructions. As we show, the new method greatly improves upon existing ICL methods for molecular design while being accessible and easy to use for scientists.


Automated Molecular Concept Generation and Labeling with Large Language Models

Zhang, Shichang, Xia, Botao, Zhang, Zimin, Wu, Qianli, Sun, Fang, Hu, Ziniu, Sun, Yizhou

arXiv.org Artificial Intelligence

Artificial intelligence (AI) is significantly transforming scientific research. Explainable AI methods, such as concept-based models (CMs), are promising for driving new scientific discoveries because they make predictions based on meaningful concepts and offer insights into the prediction process. In molecular science, however, explainable CMs are not as common compared to black-box models like Graph Neural Networks (GNNs), primarily due to their requirement for predefined concepts and manual label for each instance, which demand domain knowledge and can be labor-intensive. This paper introduces a novel framework for Automated Molecular Concept (AutoMolCo) generation and labeling. AutoMolCo leverages the knowledge in Large Language Models (LLMs) to automatically generate predictive molecular concepts and label them for each molecule. Such procedures are repeated through iterative interactions with LLMs to refine concepts, enabling simple linear models on the refined concepts to outperform GNNs and LLM in-context learning on several benchmarks. The whole AutoMolCo framework is automated without any human knowledge inputs in either concept generation, labeling, or refinement, thereby surpassing the limitations of extant CMs while maintaining their explainability and allowing easy intervention. Through systematic experiments on MoleculeNet and High-Throughput Experimentation (HTE) datasets, we demonstrate that the AutoMolCo-induced explainable CMs are beneficial and promising for molecular science research.


Twins in rotational spectroscopy: Does a rotational spectrum uniquely identify a molecule?

Schwarting, Marcus, Seifert, Nathan A., Davis, Michael J., Blaiszik, Ben, Foster, Ian, Prozument, Kirill

arXiv.org Artificial Intelligence

Rotational spectroscopy is the most accurate method for determining structures of molecules in the gas phase. It is often assumed that a rotational spectrum is a unique "fingerprint" of a molecule. The availability of large molecular databases and the development of artificial intelligence methods for spectroscopy makes the testing of this assumption timely. In this paper, we pose the determination of molecular structures from rotational spectra as an inverse problem. Within this framework, we adopt a funnel-based approach to search for molecular twins, which are two or more molecules, which have similar rotational spectra but distinctly different molecular structures. We demonstrate that there are twins within standard levels of computational accuracy by generating rotational constants for many molecules from several large molecular databases, indicating the inverse problem is ill-posed. However, some twins can be distinguished by increasing the accuracy of the theoretical methods or by performing additional experiments.


AUTODIFF: Autoregressive Diffusion Modeling for Structure-based Drug Design

Li, Xinze, Wang, Penglei, Fu, Tianfan, Gao, Wenhao, Li, Chengtao, Shi, Leilei, Liu, Junhong

arXiv.org Artificial Intelligence

Structure-based drug design (SBDD), which aims to generate molecules that can bind tightly to the target protein, is an essential problem in drug discovery, and previous approaches have achieved initial success. However, most existing methods still suffer from invalid local structure or unrealistic conformation issues, which are mainly due to the poor leaning of bond angles or torsional angles. To alleviate these problems, we propose AUTODIFF, a diffusion-based fragment-wise autoregressive generation model. Specifically, we design a novel molecule assembly strategy named conformal motif that preserves the conformation of local structures of molecules first, then we encode the interaction of the protein-ligand complex with an SE(3)-equivariant convolutional network and generate molecules motif-by-motif with diffusion modeling. In addition, we also improve the evaluation framework of SBDD by constraining the molecular weights of the generated molecules in the same range, together with some new metrics, which make the evaluation more fair and practical. Extensive experiments on CrossDocked2020 demonstrate that our approach outperforms the existing models in generating realistic molecules with valid structures and conformations while maintaining high binding affinity.


PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models

Carrami, Eli M, Sharifzadeh, Sahand

arXiv.org Artificial Intelligence

We introduce the novel task of zero-shot Protein Question Answering (PQA) for free-form scientific enquiry. Given a previously unseen protein sequence and a natural language question, the task is to deliver a scientifically accurate answer. This task not only supports future biological research, but could also provide a test bed for assessing the scientific precision of large language models (LLMs). We contribute the first specialized dataset for PQA model training, containing 257K protein sequences annotated with 1.97M scientific question-answer pairs. Additionally, we propose and study several novel biologically relevant benchmarks for scientific PQA. Employing two robust multi-modal architectures, we establish an initial state-of-the-art performance for PQA and reveal key performance factors through ablation studies. Our comprehensive PQA framework, named Pika, including dataset, code, model checkpoints, and a user-friendly demo, is openly accessible on github.com/EMCarrami/Pika, promoting wider research and application in the field.


LLamol: A Dynamic Multi-Conditional Generative Transformer for De Novo Molecular Design

Dobberstein, Niklas, Maass, Astrid, Hamaekers, Jan

arXiv.org Artificial Intelligence

In fields like energy storage materials or medicinal chemistry, substances are key to technological advancement and progress: the success of these applications hinges on the specific properties of the materials. However, the processes of discovery and development of new materials often face practical and/or principal obstacles, such as unavailability of compounds or precursors, high production costs, and the need for extensive trials on the practical side, or limited data and/or experience, as well as biased expectations of designers and developers on the other hand. Generative models, a powerful category in machine learning, have the potential to address both of these issues simultaneously, as they can help focus our efforts a priori only on the most likely candidates. Many architectures related to creation of novel data points were developed in recent years, most notably Recurrent Neural Networks (RNN) [1], Generative Adversarial Networks (GAN) [2], Variational Autoencoders (VAE) [3] and Transformers [4]. The transformer architecture, especially, has revolutionized the fields of Natural Language Processing (NLP) [5] and other domains like computer vision [6]. The introduction of the General Pretrained Transformer (GPT) architecture led to significant advancements in generative natural language applications. Generative models have also been applied in the fields of medicine and material science to create new molecules with predefined features, a process known as conditional generation [7, 8]. This application can significantly accelerate the discovery of new candidate molecules. Although current generative models may not provide the optimal solution, they can greatly reduce the size of the chemical space that needs to be evaluated.